On Policy Learning in Restricted Policy Spaces

نویسندگان

Robby Goetschalckx

Jan Ramon

چکیده

We consider the problem of policy learning in aMarkov Decision Process (MDP) where only a restricted, limited subset of the full policy space can be used. A MDP consists of a state space S, a set of actions A, a transition probability function t(s, a, s′) and a reward function R : S → R. Also there is the discount factor γ. The problem is to find a policy, a mapping from states to actions π : S → A, which gives the highest discounted return IE ∑∞ i=1 γ R(s) (where s represents the state encountered at time step i) for every possible start state. However, we are not interested in any possible policy, only in a restricted, limited subsetΠ of the full policy space. The assumption will be made that there exists a policy π which is best for every state s ∈ S, compared to the other policies in Π. It is not required that the true optimal policy for the MDP belongs to Π. In some settings we can also consider stochastic policies, which map states to a probability distribution over the action set. This greatly increases the size of the policy search space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Policy Capacity in the Learning Healthcare System; Comment on “Health Reform Requires Policy Capacity”

Pierre-Gerlier Forest and his colleagues make a strong argument for the need to expand policy capacity among healthcare actors. In this commentary, I develop an additional argument in support of Forest et al view. Forest et al rightly point to the need to have embedded policy experts to successfully translate healthcare reform policy into healthcare change. Translation of externally generated i...

متن کامل

Policy Reuse for Transfer Learning Across Tasks with Different State and Action Spaces

Policy Reuse is a reinforcement learning method in which learned policies are saved and reused in similar tasks. The policy reuse learner extends its exploration to probabilistically include the exploitation of past policies, with the outcome of significantly improving its learning efficiency. In this paper we demonstrate that Policy Reuse can be applied for transfer learning among tasks in dif...

متن کامل

Regularized Policy Iteration with Nonparametric Function Spaces

We study two regularization-based approximate policy iteration algorithms, namely REGLSPI and REG-BRM, to solve reinforcement learning and planning problems in discounted Markov Decision Processes with large state and finite action spaces. The core of these algorithms are the regularized extensions of the Least-Squares Temporal Difference (LSTD) learning and Bellman Residual Minimization (BRM),...

متن کامل

Probabilistic Policy Reuse for inter-task transfer learning

Policy Reuse is a reinforcement learning technique that efficiently learns a new policy by using past similar learned policies. The Policy Reuse learner improves its exploration by probabilistically including the exploitation of those past policies. Policy Reuse was introduced and previously demonstrated its effectiveness in problems with different reward functions in the same state and action ...

متن کامل

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

On Policy Learning in Restricted Policy Spaces

نویسندگان

چکیده

منابع مشابه

Policy Capacity in the Learning Healthcare System; Comment on “Health Reform Requires Policy Capacity”

Policy Reuse for Transfer Learning Across Tasks with Different State and Action Spaces

Regularized Policy Iteration with Nonparametric Function Spaces

Probabilistic Policy Reuse for inter-task transfer learning

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

عنوان ژورنال:

اشتراک گذاری